Local information transfer as a spatiotemporal filter for complex systems 
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We present a measure of local information transfer, derived from an existing averaged information- 
theoretical measure, namely transfer entropy. Local transfer entropy is used to produce profiles of 
the information transfer into each spatiotemporal point in a complex system. These spatiotemporal 
profiles are useful not only as an analytical tool, but also allow explicit investigation of different 
parameter settings and forms of the transfer entropy metric itself. As an example, local transfer 
entropy is applied to cellular automata, where it is demonstrated to be a novel method of filtering for 
coherent structure. More importantly, local transfer entropy provides the first quantitative evidence 
for the long-held conjecture that the emergent traveling coherent structures known as particles 
(both gliders and domain walls, which have analogues in many physical processes) are the dominant 
information transfer agents in cellular automata. 
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I. INTRODUCTION 



Information transfer is widely considered to be a vi- 
tal component of complex nonlinear behavior in spa- 
tiotemporal systems, for example in: particles in cellular 
automata (CAs) 0, S, S, i, i, H, Q, self-organization 
caused by dipole-dipole interactions in microtubules [H, 
soliton dynamics and collisions 0], wave- fragment prop- 
agation in Belousov- Zhaboti nsky media [l(|, solid-state 
phase transitions in crystals [11 1, in fluence of intelligent 
agents over their environments [12j, and inducing emer- 
gent neural structure [l3j]. The very nature of informa- 
tion transfer in complex systems is a popular topic itself, 
for example in the conflicting suggestions that informa- 
tion transfer is maximized in complex dynamics [Til [To] ] , 
or alternatively at an intermediate level with maximiza- 
tion leading to chaos [5|, [l6| • Yet while the literature con- 
tains many measures of complexity (e.g. @,[I3|)j quanti- 
tative studies of information transfer are comparatively 
absent. 

Information transfer is popularly understood in terms 
of the aforementioned recognized instances, which sug- 
gest a directional signal or communication of dynamic 
information between a source and receiver. Defining in- 
formation transfer as the dependence of the next state 
of the receiver on the previous state of the source [IH is 
typical, though it is incomplete according to Schreiber's 
criteria [13 ] requiring the definition to be both directional 
and dynamic. In this paper, we accept Schreiber's defi- 
nition (l9l | of (predictive) information transfer as the av- 
erage information contained in the source about the next 
state of the destination that was not already contained 
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in the destination's past. This definition results in the 
measure for information transfer known as transfer en- 
tropy [l|| , quantifying "the statistical coherence between 
systems evolving in time" in a directional and dynamic 
manner. 

We derive a measure of local information transfer from 
this existing averaged information-theoretical measure, 
transfer entropy. Local transfer entropy characterizes the 
information transfer into each spatiotemporal point in 
a given system as opposed to a global average over all 
points in an information channel. Local metrics within a 
global average are known to provide important insights 
into the dynamics of nonlinear systems [201 ] : here, the 
local transfer entropy provides spatiotemporal profiles of 
information transfer, useful analytically in highlighting 
or filtering "hot-spots" in the information channels of the 
system. The local transfer entropy also facilitates close 
study of different forms and parameters of the averaged 
metric, in particular the importance of conditioning on 
the past history of the information destination, and the 
possibility of conditioning on other information sources. 
Importantly, through these applications the local trans- 
fer entropy provides insights that the averaged transfer 
entropy cannot. 

We apply local transfer entropy to cellular automata 
(CAs): discrete dynamical systems consisting of an ar- 
ray of cells which each synchronously update their state 
as a function of the states of a fixed number of spatially 
neighboring cells using a uniform rule. CAs are a clas- 
sic example of complex behavior, and have been used 
to model a wide variety of real world phenomena (see 
3). In particular, we examine elementary CAs (EC As): 
ID CAs using binary states, deterministic rules and one 
neighbor on either side (i.e. cell range r = 1). (For more 
complete definitions, including that of the Wolfram rule 
number convention for describing update rules, see [HI]). 
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CAs are selected for experimentation here because they 
have been the subject of a large body of work regarding 
the qualitative nature of information transfer in complex 
systems p], 0, i, i, H, H 0- As will be described here, 
there are well-known spatiotemporal structures in CAs 
which arc qualitatively widely-accepted as being informa- 
tion transfer agents; this provides us with a useful basis 
for interpreting the quantitative results of our applica- 
tion. The aforementioned studies revolve around emer- 
gent structure in CAs: particles, gliders and domains. A 
domain may be understood as a set of background con- 
figurations in a CA, any of which will update to another 
such configuration in the absence of a disturbance. Do- 
mains are formally defined within the framework of com- 
putational mechanics [13] as spatial process languages 
in the CA. Particles are qualitatively considered to be 
moving elements of coherent spatiotemporal structure, 
in contrast to a background domain (see [23] for a dis- 
cussion of the term "coherent structure" referring to par- 
ticles in this context). Gliders are particles which repeat 
periodically in time while moving spatially (repetitive 
non-moving structures are known as blinkers). Formally, 
particles are defined as a boundary between two domains 
[22j; as such, they can also be termed as domain walls, 
though this is typically used with reference to aperiodic 
particles. It is widely suggested that particles form the 
basis of information transmission, since they appear to 
facilitate communication about the dynamics in one area 
of the CA to another area (e.g. @). Furthermore, their 
interactions or collisions are suggested to form the basis 
of information modification, since the collisions appear 
to combine the communications in some decision process 
about the dynamics. In particular, these metaphor are 
found in studies of Turing universal computation with 
particles used to facilitate the transfer of information be- 
tween processing elements (e.g. Conway's Game of Life 
[23 | and see general discussion in Q); analyses of CAs 
performing intrinsic, universal or other specific computa- 
tion P, I2I |22L [25( 1 ; studies of the nature of particles and 
their interactions (i.e. collisions) [H, H[; and attempts to 
automatically identify CA rules which give rise to par- 
ticles, e.g. [1, |2(|, suggesting these to be the most in- 
teresting and complex CA rules. Despite such interest, 
no study has quantified the information transfer on av- 
erage within specific channels or at specific spatiotempo- 
ral points in a CA, nor quantitatively demonstrated that 
particles (either in general, or gliders or domain walls as 
sub-classes) are in fact information transfer agents. (A 
rudimentary attempt was made via mutual information 
in [5[ , however we show that this is a symmetric measure 
not capturing directional transfer). 

We hypothesize that application of a measure of lo- 
cal information transfer into each spatiotemporal point in 
CAs would reveal particles as the dominant information 
transfer agents. Our results would have wide-ranging im- 
plications for the real-world systems mentioned earlier, 
given the power of CAs as model systems of the real 
world and the obvious analogy between particles in CAs 



and coherent spatiotemporal structures and hypothesized 
information transfer agents in other systems (e.g. known 
analogues of particles in physical processes such as pat- 
tern formation and solitons [3, [2?J ; also waves of confor- 
mational change are said to perform signaling in micro- 
tubules 8]). Where no CA model exists for a given sys- 
tem, our presentation of local transfer entropy is generic 
enough to still be directly applicable for investigation of 
that system, guided by the method of application to CAs. 

Finally, several methods already exist for filtering the 
important structural elements in CAs [f| HH, [H, |28| . 
which provide another important basis for comparison 
of our spatiotemporal local information transfer profiles 
(which can also be viewed as a method of filtering). These 
methods include: finite state transducers to recognize the 
regular spatial language of the CA [13, l25ll: local infor- 
mation (i.e. local spatial entropy rate) [28(; displaying 
executing rules with the most frequently occurring rules 
filtered out J6| ; and local statistical complexity and local 
sensitivity [23 1. All of these successfully highlight parti- 
cles. Hence, filtering is not a new concept; however the 
ability to filter for information transfer could provide the 
first thoroughly quantitative evidence that particles are 
the information transfer elements in CAs. Additionally, 
it would provide insight into information transfer in each 
specific channel or direction in the CA allowing more 
refined investigation than the single measures of other 
methods, and should reveal interesting differences in the 
parts of the structures highlighted. 

We begin by providing background on required 
information-theoretical concepts, and subsequently in- 
troduce transfer entropy and derive the local transfer 
entropy from it. We also derive two distinct forms of 
the transfer entropy, namely apparent and complete, to 
be studied from a local viewpoint. The local transfer 
entropy is then applied to ECAs, highlighting particles 
(both gliders and domain walls) as expected, and so 
providing the first quantitative evidence for the widely- 
accepted conjecture that these are the dominant informa- 
tion transfer entities in CAs. The profiles also provide 
insights into the parameters and forms of the transfer 
entropy that its average is shown to be incapable of pro- 
ducing. We conclude with a summary of the important 
findings, compare our spatiotemporal profiles to other 
CA filtering methods, and describe further investigations 
we intend to perform with this metric. 



II. INFORMATION-THEORETICAL 
QUANTITIES 

Information theory (e.g. see [2!|) has proved to be a 
useful framework for the design and analysis of complex 
self-organized systems (for example, see an overview in 
[30j ] and specific examples in [1, [H, [H, HII ) • This suc- 
cess, in addition to the highly abstract nature of informa- 
tion theory (which renders it portable between different 
types of complex systems), and its general ease of use, are 
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reasons underlying its position as a leading framework for 
the analysis and design of complex systems. 

The fundamental quantity is the Shannon entropy, 
which represents the uncertainty associated with any 
measurement x of a random variable X (logarithms are in 
base 2, giving units in bits): H(X) = — J2 x p( x ) ^ogp(x). 
The conditional entropy of X given Y is the average 
uncertainty that remains about x when y is known: 
H{X\Y) = — y p{x,y)\ogp(x\y). The mutual infor- 
mation between X and Y measures the average reduc- 
tion in uncertainty about x that results from learning the 
value of y, or vice versa: 



7(X;F)=^p(^y)log 



x.y 



p{x,y) 

p(x)p(y) ' 



(la) 



I(X; Y) = H(X) - H(X\Y) = H(Y) - H(Y\X). (lb) 

The conditional mutual information between X and 
Y given Z is the mutual information between X and Y 
when Z is known: 



I(X; Y\Z) = H(X\Z) - H{X\Y, Z). 



(2) 



The entropy rate (denoted as h^) [32j is the limiting 
value of the conditional entropy of the next state x n+ \ 
of X given knowledge of the previous k — 1 states x„ 
(up to and including time n, i.e. x n ^k+2 to x n ) of X: 



V = lim H(x n+ i\x ( n 



(3) 



III. LOCAL INFORMATION TRANSFER 

It is natural to look to information theory for the con- 
cept of information transfer. As such, we adopt trans- 
fer entropy from this realm and subsequently derive lo- 
cal transfer entropy from it. Additionally, we provide 
comment on the parameters of the transfer entropy, and 
present the concepts of apparent and complete transfer 
entropy, and self-information transfer. 



A. Transfer Entropy 

As alluded to earlier, mutual information has been 
something of a de facto measure for information transfer 
in complex systems (e.g. 0, 0, HH). A major problem 
however is that mutual information contains no inherent 
directionality. Attempts to address this include using 
the previous state of the "source" variable and the next 
state of the "destination" variable (known as time-lagged 
mutual information) . However, Schreiber [T^ | points out 
that this ignores the more fundamental problem that mu- 
tual information measures the statically shared informa- 
tion between the two elements. (The same criticism ap- 
plies to equivalent non information-theoretical definitions 
such as that in [18||). 



To address these inadequacies Schreiber introduced 
transfer entropy [T^ |. the deviation from independence 
(in bits) of the state transition (from the previous state 
to the next state) of an information destination X from 
the (previous) state of an information source Y: 



Ty^x = ^p(u„)log 



p(x n+ l\x { n ] ,y { n) 



p(x n+1 \x 



(fch 



(4) 



Here n is a time index, u n represents the state transition 
tuple (x„_|_i, Xn , j/n )j a4 and y„ represent the k and I 
past values of x and y up to and including time n (with 
k, I = 1 being default choices). Schreiber points out that 
this formulation of the transfer entropy is a truly dynamic 
measure, as a generalization of the entropy rate to more 
than one element to form a mutual information rate. The 
transfer entropy can be viewed as a conditional mutual 
information |34[ (see Eq. ((2])), casting it as the average 
information contained in the source about the next state 
X' of the destination that was not already contained in 
the destination's past: 

T Y ^x = I(Y; X'\X) = H(X'\X) - H(X'\X, Y). (5) 

This could be interpreted (following [3(| and [HI) as the 
diversity of state transitions in the destination minus as- 
sortative noise between those state transitions and the 
state of the source. Importantly, as an information the- 
oretic measure based on observational probabilities, the 
transfer entropy is applicable to both deterministic and 
stochastic systems. 

Transfer entropy has been used to characterize infor- 
mation flow in sensorimotor networks [l^ | and with re- 
spect to information closure [35| in two recent studies. 
We note the alternative perturbation-based candidate in- 
formation flow for quantifying information transfer from 
the perspective of causality rather than prediction; we 
intend to compare transfer entropy to this measure in 
future work. Furthermore, a separate notion of informa- 
tion flow in CAs was introduced in [28[ (connected to the 
local information though not used for filtering). There 
are several fundamental problems with this formulation 
however: it is only applicable to reversible CAs, only has 
meaning as information flow for deterministic mechanics, 
and is not able to distinguish information flow any more 
finely than information from the left and the right. 

In this paper, we accept Schreiber's formulation of 
transfer entropy (Eq. J?])) as a theoretically correct quan- 
titative definition of information transfer, from a predic- 
tive or computational perspective. However, this quanti- 
tative definition has not yet been unified with the ac- 
cepted specific instances of information transfer (e.g. 
particles in CAs); these instances are local in space and 
time and to be investigated require a local measure of in- 
formation transfer. In presenting local transfer entropy 
here, we seek to unify the apparently correct quantitative 
formulation of information transfer (i.e. transfer entropy) 
with accepted specific instances of information transfer. 



B. Local Transfer Entropy 



space/agents 
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To derive a local transfer entropy measure, we first note 
that Eq. (|U) is summed over all possible state transition 

tuples u n = (x n+ x,Xn\y$), weighted by the probabil- 
ity of observing each such tuple. This probability p(u n ) 
is operationally equivalent to the ratio of the count of 
observations c(u n ) of u n , to the total number of observa- 
tions N made: p(u n ) = c(u n )/N. To precisely compute 
this probability, the ratio should be composed over all re- 
alizations of the observed variables (as described in [3t|): 
realistically however, estimates will be made from a fi- 
nite number of observations. Subsequently, we replace 
the count by its definition c{u n ) = X)a=i^ 1> l eavm § the 
substitution p(u n ) = feo=i } A l N into E + 



'c(u„) 



I I ( fc ) (0\ 

bg p(,„ + ,|^>) ' " 



The log term may then be brought inside this inner sum 

N 2^ 2^ ° r 



a=l 



p{x n+ i\Xn ) ) 



(7) 



This leaves a double sum running over each actual ob- 
servation a for each possible tuple observation u n , which 
is equivalent to a single sum over all N observations: 

r ^-^E lQ S — - T^U ■ (8) 

It is clear then that the transfer entropy metric is an 
global average (or expectation value) of a local transfer 
entropy at each observation: 



Ty^x 

t Y ^x(n + l,k,l) = 



= (t Y ^x{n + l,k,l)}; (9a) 
i i ( fc ) (Q\ 

to8 P(*n+l|*n (9b) 
p(x n+1 \x ( n ) 

The measure is local in that it is defined at each time 
n for each destination element X in the system and each 
causal information source Y of the destination. This 
method of forming a local information-theoretic measure 
by extracting the log term from a globally averaged mea- 
sure is used less explicitly for the local excess entropy (36|, 
the local statistical complexity [23|, |36| , and the local in- 
formation 28]. It is applicable to any such information- 
theoretic metric: we form the local (time-lagged) mutual 
information between the source and destination variables 
from Eq. (|la|) as: 



m{y^;x n+1 ) = log 



P(Vn\ X n +l) 



p(x n +x)p(y: 



(10) 



and similarly rewrite Eq. ([5]) as the expectation value 
of a local conditional mutual information: Ty^x = 

(rn(y^;x n+ x\x^)\. 



agent X t _ 



agent A' ; 




□ □ 



n-k+l 
n-l+1 



time 



o-l 

n 

n+l 



FIG. 1: Local transfer entropy t(i,j, n + l, k, I) is the informa- 
tion transfered from an I sized block of the source cell Xi-j 
to the destination cell Xi at time step n + l, conditioned on 
k past states of the destination cell. Note: \j\ < r for CAs. 



For lattice systems such as CAs with spatially- ordered 
sources and destinations, we represent the local transfer 
entropy to cell from X^_j at time n + 1 as: 



t(i,j,n+ l,k,l) = log 



/ I O) (0 \ 

P{Xi, n +l \X i n , Ei—j n ) 



p{x 



+i\x (k) ) 



(11) 



Similarly, the local (time-lagged) mutual information can 

be represented as: m(i,j,n+ 1,1) — m(x^ ■ „; x n +i). 
Fig. [1] shows the local transfer entropy in a spatiotempo- 
ral system. The metrics are defined for every spatiotem- 
poral destination (i,n), forming a spatiotemporal profile 
for every information channel or direction j where sen- 
sible values for CAs are within the cell range, \j\ < r. 
Notice that j represents the number of cells from the 
source to the destination, e.g. j = 1 denotes transfer 
across one cell to the right per unit time step. We use 
T(j, k, I) to represent the average over all spatiotemporal 
points on the lattice. 

Importantly, note that the destination's own historical 
values can indirectly influence it via the source, which 
may be mistaken as an independent flow of information 
from the source. This is only possible in systems such as 
CAs with bidirectional information transfer. Such self- 
influence is a non-traveling form of information (in the 
same way as standing waves are to energy); it is essen- 
tially static and can be viewed as the trivial part of infor- 
mation transfer. This non-traveling information is elimi- 
nated from the measurement by conditioning on the des- 

(k) 

tination's history x\ „ . Yet any self-influence transmitted 
prior to these k values will not be eliminated; we general- 
ize comments on the entropy rate in jisij to suggest that 
taking the asymptote k — > oo is most correct for agents 
displaying non-Markovian dynamics (when considering 
their time-series in isolation). As such, we formalize the 



5 



local transfer entropy as: 



information sources v, 



i i (fe) (0 i 



t(i,j,n+l,l) = lim lo L 

P(^,n+l|<„) 



(12) 



and similarly tY^x( n + 1; == li m fe^oo tY^x( n + L 
for a single source-destination pair. Computation at this 
limit is not feasible in general, so we retain tY-*x{n, fc, I) 
and t(i,j, n, fc, I) for estimation with finite k. 

Also, we drop I from the notation (e.g. t(i,j,n) and 
t(i,j,n,k)) where the default setting of I = 1 is used to 
measure transfer from the single previous state only. 



C. Complete and Apparent Transfer Entropy 

The averaged transfer entropy is constrained between 
and log b bits (where b is the number of possible states 
for a discrete system): as a conditional mutual infor- 
mation, it can be either larger or smaller than the cor- 
responding mutual information [29]]. The local transfer 
entropy however is not constrained so long as it aver- 
ages into this range: it can be greater than log b for a 
significant local information transfer, and can also in 
fact be measured to be negative. Local transfer en- 
tropy is negative where (in the context of the history 
of the destination) the probability of observing the ac- 
tual next state of the destination given the value of the 
source p{x l , 

n+i \ x \ n > x i—j n)i is lower than that of observ- 
ing that actual next state independently of the source 
p(xi <n+ i \x[ k ^). In this case, the source element is actually 
misleading about the state transition of the destination. 
It is possible for the source to be misleading in this con- 
text where other causal information sources influence the 
destination, or in a stochastic system. (Similarly a local 
mutual information, Eq. (flO]) . can be negative). 

Importantly, the transfer entropy may be conditioned 
on other possible information sources Z ,19] (becoming 
I(Y; X'\X, Z)), to eliminate their influence from being 
mistaken as that of the source Y. To be explicit, we 
label calculations conditioned on no other information 
contributors (e.g. Eq. (|12p ) as apparent transfer entropy. 

For ECAs, conditioning on other possible information 
sources logically means conditioning on the other cells 
in the destination's neighborhood, which we know to be 
causal information contributors. Firstly, we represent the 
joint values of the neighborhood of the destination Xi tn +i, 
excluding the source for the transfer entropy calculation 



n and the previous value of the destination Xi >n , as: 



v i,j,n = { x i+q,n\Vq ■ ~r <q <+r,q^ —j, 0} , (13) 

where r is the range of causal information contributors 
(i.e. the cell range for CAs). We then derive local com- 
plete transfer entropy as the information contained in the 
source about the next state of the destination that was 
not contained in the destination's past or in other causal 



j / ■ lX , P( x i,n+1 \ x i,n ' x i-j,ni v i,j,n) 

t c (i,j,n+ 1) = hm log - '—^ ^ — -(14) 



k— *oo 



p(x i>n+1 |4„,< j J 



Again, t c (i,j,n,k) denotes finite k estimates. Eq. (fT4]) 
specifically considers systems where only immediately 
previous source values can be causal information con- 
tributors: here under complete conditioning I > 1 can- 
not add any information to the source. In other systems 
Eq. (TT4|) could be adjusted accordingly. In determin- 
istic systems (e.g. ECAs), complete conditioning ren- 
ders t c (i,j,n) > 0: it is not possible for the information 
source to be misleading when all other causal information 
sources are being considered. T c {j) represents the aver- 
age over all spatiotemporal points on the lattice. Com- 
plete transfer entropy can be constructed for any system 
by conditioning out all causal information contributors 
apart from the information source under consideration. 



D. Summed Information Transfer Profiles 

We label the case j = as self-information transfer, 
where the "source" is the immediate past value of the 
destination. We condition this calculation on the k val- 
ues before the I source values so as not to condition on 
the source. Self-information transfer computes the infor- 
mation contributed by the previous state of the given cell 
about its next state that was not contained in its prior 
history; this can be thought of as traveling information 
with an instantaneous velocity of zero. This is not a par- 
ticularly useful quantity in and of itself, however it helps 
to form a useful profile with transfer entropies for j ' ^ 
in the summed local information transfer profiles. These 
are defined for apparent and complete transfer entropy 
respectively as: 

r 

t s (i,n,k,l)= ^ t(i,j,n,k,l), (15a) 
j=—r 
r 

t sc (i,n,k) = ^2 t c (i,j,n,k). (15b) 
j=—r 

IV. RESULTS AND DISCUSSION 

The local transfer entropy metrics were studied with 
several important ECA rules. We investigate the varia- 
tion of the profiles as a function of k, examine the chang- 
ing nature of the profiles with ECA type, and compare 
the apparent and complete metrics. Each instance was 
run from an initial randomized state of 10 000 cells, with 
the first 30 time steps eliminated to allow the CA to set- 
tle, and a further 600 time steps captured for investiga- 
tion. All results were confirmed by at least 10 runs from 
different initial randomized states, and periodic bound- 
ary conditions were used. We fixed I at 1: values of I > 1 
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are irrelevant for the complete metric when applied to 
CAs, and for the apparent metric we are interested in 
information directly transfered at the given local time 
step only. For spatially-ordered systems with homoge- 
neous agents such as CAs, it is appropriate to estimate 
the probability distributions from all spatiotemporal ob- 
servations (i.e. from the whole CA) of the corresponding 
channel rather than only the source-destination pair un- 
der measurement. 

We concentrate on rule 110 (a complex rule with sev- 
eral configurations of regular particles, or gliders) and 
rule 18 (a chaotic rule with irregular particles, or domain 
walls) ; the rule classification here is from [2l| . The selec- 
tion of these particular rules allow comparison with the 
results of other filtering techniques. We expect local in- 
formation transfer profiles to highlight both regular and 
irregular particles, the important elements of structure in 
CAs which are conjectured to be the information transfer 
agents. 



A. Base comparison cases 

For rule 110 the raw states of a sample CA run are 
displayed in Fig. [2(a) (all figures were generated using 
modifications to [37IJ). As base cases we measured (time- 
lagged) local mutual information m(i,j, n), and local ap- 
parent and complete transfer entropies with the default 
value of k = 1: t(i,j,n,k = 1) and t c (i,j,n,k = 1). The 
base comparison case of local mutual information is anal- 
ogous to that with globally averaged measures in [l9| , yet 
the local profiles yield a more detailed contrast here than 
averages do. Note that k = 1 is the only value used in 
[l9| (in less coupled systems) and the later applications 
of the transfer entropy in [13, 34, 35]). The local profiles 
generated with j — 1 (i.e. one cell to the right per unit 
time) for these base cases are shown in Fig. [5] These 
measures are unable however to distinguish gliders from 
the background here with any more clarity than the raw 
CA plot itself. (The negative components of m(i,j,n) 
and t{i,j,n, k = 1), not shown, are similarly unhelpful). 
These basic metrics were also unsuccessful with other val- 
ues of j and with other CA rules; this provides explicit 
demonstration that they are not useful as measures of 
information transfer in complex systems. 



B. Gliders as dominant information transfer agents 

Experimentally, we find our expectation of gliders be- 
ing highlighted as dominant information transfer against 
the domain once k > 6 for EC A rule 110 (for both the 
complete and apparent metric, in both channels j = 1 
and —1). Fig. [3] displays the local complete transfer en- 
tropy profiles computed here using k — 6 (we return to 
examine the apparent metric in Section |IVD|). Note that 
higher values of local complete transfer entropy are at- 
tributed by each measure to the gliders moving in the 





(a) 



(b) 





(c) 



(d) 



FIG. 2: Base comparison metrics incapable of quantifying 
local information transfer (one cell to the right). Application 
to raw states of EC A Rule 110 shown in (a) (86 time steps 



displayed for 86 cells, time increases down the page for all CA 
plots): (b) Local (time-lagged) mutual information m(i,j = 



l,n), positive values only, (all figures scaled with 16 colors) 
with max. 0.48 bits (black), min. 0.00 bits (white); (c) Local 
complete transfer entropy t c (i,j = 1, n, k = 1), max. 1.28 bits 
(black), min. 0.00 bits (white); (d) Local apparent transfer 
entropy t(i,j = l,n,k — 1), positive values only, max. 
bits (black), min. 0.00 bits (white). 



0.67 



same macroscopic direction of motion as the direction of 
information transfer being measured, as is expected from 
such measures. Also, the summed local complete transfer 



in Fig. 3(b) gives a filtered plot very similar to that found 
for rule 110 using other techniques (see [(HHI). Simply 
relying on the average transfer entropy values does not 
provide us these details (see Section HV E() . 



Fig. 4(a) displays a close-up example of a right mov- 



ing glider in ECA rule 110, which application of the 
local complete transfer entropy in Fig. |4(b)| reveals is 
composed of a repeating series of two consecutive infor- 
mation transfers to the right followed by a pause. Al- 
though one may initially suggest that the glider struc- 
ture includes the points marked "x" , careful considera- 
tion of exactly where a source can add information to 
that contained in the past of the domain suggests oth- 
erwise. Consider the point one cell to the left of those 
marked "x" , the second of the two consecutive trans- 
fers to the right. To compute t c (i,j = 1, n + 1, k = 6) 
(one cell to the right) at this point, we first compute 



p(x iin+1 \x 



(fe=6) 



-i,nj di.j=i,r,n) = 1-0 (since the system 
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FIG. 3: Local transfer entropy with k = 6 highlights glid- 
ers. Application to raw states of EC A Rule 110 in |(a)| (86 
time steps for 86 cells): (b) Summed local complete trans- 
fer entropy profile t sc (i , n, k = 6), max. 8.22 bits (black), 
min. 0.00 bits (white); (c) Local complete transfer entropy 
tc(i,j = l,n,k — 6) (on e cell to the right), max. 4.95 
bits, min. 0.00 bits; (d) Local complete transfer entropy 
tc(i,j = — l,n, fc = 6) (one cell to the left), max. 6.72 bits, 
min. 0.00 bits. 



is deterministic) and p{xi >n j r \\xf~ Q \ dij = i^ r ^ n ) = 0.038. 
The local transfer entropy will be high here because the 
probability of observing the actual next state of the des- 
tination is much higher when the source is taken into ac- 
count than when it is not; correspondingly using Eq. (|14D 
we have t c {i,j = l,n+ 1, k = 6) = 4.7 bits at this point. 
The points marked "x" are effectively predictable from 
the temporal pattern of the preceding domain however, 
and so do not contain significant information transfer. In- 
terestingly, the points containing significant information 
transfer are not necessarily the same as those selected 
as particles by other filtering methods; e.g. finite state 
transducers (using left to right scanning by convention 
(25l ]) would identify points two cells to the right of those 
marked "x" as part of the glider. 

To understand why k > 6 was useful in this case, we 
consider an infinite temporally periodic domain, with pe- 
riod say p. (This serves as an extension of the demon- 
stration in [191 ] of zero average transfer in a lattice of 
spatial and temporal period 2 using k = 1 to a domain 
of arbitrary period). For the time-series of a single cell 
there, the number of states an observer must examine 
to have enough information to determine the next state 
is limited by the period p (as per the synchronization 
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(a) 



(c) 



FIG. 4: Close-up example of a glider in ECA Rule 110 (x's and 
o's used only for visual alignment). 18 time steps displayed 
for 12 cells: (a) Raw CA; |(b)| Local complete transfer entropy 
t c (i,j = 1, n, k = 6) (one cell to the right), maxima in view 
4.70 bits (gray), minima 0.00 bits (white) ; |(c)| Local apparent 
transfer entropy t(i,j — —l,n,k = 6) (one cell to the left), 
negative values only, minima in view -2.04 bits (gray), maxima 
0.00 bits (white). 



time r in [3c|). Local transfer entropy measurements 
with k > p — 1 would therefore not detect any additional 
information from the neighbors about the next state of 
the destination than is already contained in these k pre- 
vious states (correctly inferring zero transfer). Using 
k < p — 1 on the other hand may attribute the non- 
traveling self-influence of the destination to the source. 
Taking k > p— 1 provides a sufficient (Markovian) condi- 
tion for eliminating this non-traveling information in an 
infinite periodic domain, rather than requiring the full 
asymptote k 00. Establishing a minimal condition is 
related to the synchronization time r for the entropy rate 
[3e| , though is slightly more complicated here because we 
need to consider the source cell. 

However, a minimal correct value for k does not exist 
for a given system with bidirectional communication in 
general. The above argument was only applicable for do- 
mains which are periodic and infinite, and the existence 
of any gliders prevents a periodic domain from being in- 
finite. Where the history of a given destination includes 
encountering gliders at some point, this partial knowl- 
edge of nearby glider activity is an important compo- 
nent in the probability distribution of the next state of 
that destination. Yet there is no limit on how far into 
the future a previous glider encounter may influence the 
states of a destination (because of the system's capacity 
for bidirectional communication) . That is to say, there is 
no Markovian condition for eliminating the non-traveling 
information in general in such systems; as such the limit 
k — > 00 should be taken in measuring the transfer en- 
tropy. While using only the condition k > p — 1 is not 
completely correct, it will eliminate the non-traveling in- 
formation in the domain pertaining to the periodic struc- 
ture only. Where this part is dominant in the domain, 
as in for ECA rule 110 here, the gliders are likely to be 
highlighted against the periodic domain with k > p — 1. 
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(c) 



(d) 



FIG. 5: Estimating local transfer entropy profiles for k 
with k = 16 for the raw s tate s of ECA Rule 110 in Fig.|3( 
time steps for 86 cells) 

max 



3(86 



(a) Summed local complete transfer 
14.7 bits (black), min. 



entropy profile t a c (i, n, k = 16) 

0. 00 bits (white); (b) Local complete transfer entropy t c (i,j — 

1, n, k = 16) (one cell to the right), max. 9.99 bits, min. 0.00 
bits; (c) Local complete transfer entropy t c (i,j = — l,n, k = 
16) (one cell to the left), max. 10.1 bits, min. 0.00 bits; (d) 
Local apparent transfer entropy t(i,j — —l,n,k = 16) (one 
cell to the left), positive values only, max. 10.4 bits, min. 0.00 
bits. 



(This could be considered a rule of thumb for determining 
a minimum useful k). 

While the results for k = 6 visually correlate with pre- 
vious filtering work, using the limit k — > oo would be 
more correct. Achieving this limit is not computation- 
ally feasible, but reasonable estimates of the probability 
distributions can be made: Fig. [5] displays the local com- 
plete transfer entropy profiles computed for ECA rule 110 
using k = 16. These plots highlight information trans- 
fer almost exclusively now in the direction of the macro- 
scopic glider motion, which is even more closely aligned 
with our expectations than was seen for k = 6. Impor- 
tantly, much less of the gliders are highlighted than for 
k = 6 or other techniques, and the larger values of trans- 
fer entropy are concentrated around the leading time- 
edges of the gliders. This suggests that the leading glider 
edges determine much of the following dynamics which 
then comprise mainly non-traveling information. Note 
also that the "vertical" glider (at the left of Fig. |3(b)| 
with spatial velocity zero) is not highlighted now. Its 
cell states are effectively predictable from their past, ob- 
servable once k becomes greater than its vertical period. 



Another interesting effect of the existence of gliders is 
that the next state of a cell in the domain is not com- 
pletely determined by its periodic history. The neigh- 
boring information sources have the capability to add in- 
formation about the next state of that destination, by 
signaling whether a glider is incoming or not. That is 
to say, it is possible to measure a non-zero information 
transfer inside finite domains, effectively indicating the 
absence of a glider (i.e. that the domain shall continue). 
For ECA rule 110 in Fig. [31 we do in fact measure small 
but non-zero information transfer at certain points in the 
periodic background domain (small enough to appear to 
be zero). These values tend to be stronger in the wake 
of real gliders: since gliders are often followed by others, 
there is a stronger indication of their absence. Consider 
the points in the periodic domain marked by "o" in Fig. 3) 
these have the same history as the previously discussed 
points of high information transfer; their neighborhood 
(excluding the source on the left) is also the same. Here, 

we compute p(x^ n +i \x£~ & ' , di,j=i,r,n) = 1-0 and 

p(xi >n +i \x^~^ , dij=i, r ,n) = 0.96: the probability of ob- 
serving the actual next state of the destination becomes 
slightly higher when the information source on the left is 
taken into account. As such, we have t c (i, j = 1, n+1, k = 
6) = 0.057 bits to the right at this point, demonstrating 
the possibility for small non-zero information transfer in 
the periodic domain. This effect occurs for both the com- 
plete and apparent measures and is not a finite k effect. 

Also, note in Fig. [5] there is some information trans- 
fer in the orthogonal direction for each glider. Some is 
expected to vanish as k — > oo, yet some will remain for 
a similar reason to the non-zero transfer in domains, i.e. 
considering the source does add information about the 
next state of the destination. Importantly, this orthogo- 
nal transfer is not as significant as that in the macroscopic 
glider direction in terms of magnitude and coherence. 

Given these effects, we describe gliders as the domi- 
nant, as opposed to the only, information transfer agents 
here. (These findings have also been verified for ECA 
rule 54, another complex rule containing gliders.) While 
these profiles appear similar to other filtering work in 
some respects, it is only local transfer entropy profiles 
that provide quantitative evidence that gliders are the 
dominant information transfer agents in CAs. 



C. Domain walls as dominant information transfer 
agents 

We also investigated ECA rule 18, known to con- 
tain domain walls against the background. Application 
of local complete transfer entropy to the sample run 
in Fig. 6(a) highlights the domain walls as containing 
strong information transfer in each channel (e.g. see 
tc(i, 3 — l,n,k = 16) in Fig. 6(c)|. A full picture is given 

as expected, our 



by the summed profile in Fig. 16(b) 



results quantitatively confirm the domain walls as domi- 
nant information transfer agents against the domain. We 
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Domain walls involve the meeting of two domains 
which are out of phase: motion of the wall can be viewed 
as one domain intruding into the other. At such points, 
we observe high transfer entropy in the direction of move- 
ment because the information source (as part of the in- 
truding domain) adds much information about the next 
state of the destination that was not in the destination's 
past or the rest of the CA neighborhood. This high- 
lighting of the domain walls is somewhat similar to that 
produced by other filteringtechniques, although an im- 
portant distinction to @, H^, [H| is that this technique 
highlights the domain wall areas as only being a single 
cell wide: as described above, a single cell width is all 
that is required to explain the meeting of two domains of 
rule 18 from a temporal perspective. 

We also applied these measures to ECA rule 22 (not 
shown), plots of whose raw states appears similar to rule 
18 at first glance. However, this rule has not been found 
to contain structure such as domain walls [231] . Similar to 
those results, local transfer entropy measures significant 
information transfer at many points in the CA, but does 
not find any coherent structure to this transfer. 



(c) 



(d) 



FIG. 6: Local transfer entropy profiles for raw states of ECA 
Rule 18 in |(a)| (55 time steps for 55 cells displayed) highlight 
domain walls: (b) Summed local complete transfer entropy 
profile t sc (i,n,k — 16), max. 13.5 bits (black), min. 0.00 bits 
(white); (c) Local complete transfer entropy t c (i,j = 1, n, k = 
16) (one cell to the right), max. 14.9 bits, min. 0.00 bits; (d) 
Local apparent transfer entropy t(i,j = l,n, k = 16) (one cell 
to the right), positive values only, max. 11.9 bits, min. 0.00 
bits; 



have observed similar results for ECA rule 146. 

Importantly, the domain contains a significant level of 
information transfer here. In fact, there is a pattern to 
the transfer in the domain of spatial and temporal period 
2 which corresponds very well to the period-2 spatial e- 
machine generated to recognize the domain of rule 18 in 
[22| . Every second site of the domain in the raw CA 
is a "0", and the alternate site is either a "0" or a "1" 
(depending on the neighborhood configuration). At ev- 
ery second site with the "0" values, there is vanishing 
local complete information transfer (for either incoming 
channel j = 1 or —1) because the state of the cell is 
completely predictable from this temporal periodic pat- 
tern in its past. At the alternate sites, the local com- 
plete information transfer is approximately 1 bit from 
both incoming channels j = 1 and —1 (by limited in- 
spection the measurements were between 0.96 and 1.04 
bits with k = 16). At these points, (in an infinite do- 
main) both alternative next states are equally likely (in 
the context of the destination's past and the rest of the 
CA neighborhood) before considering the source; when 
it is considered, the next state is determined and 1 bit of 
information is added. 



D. Apparent transfer entropy 

Profiles generated with the local apparent transfer en- 
tropy contain many of the same features as those for 
the complete metric: gliders and domain walls are high- 
lighted as the dominant information transfer agents in 
their direction of motion; large values of k are required 
to reasonably approximate the probability distribution 
functions; and non-zero information transfer is still pos- 
sible in domains and in orthogonal directions to macro- 
scopic glider motion. 



For ECA rule 110, Fig. 5(d) displays the positive val- 
ues for t(i,j = — 1, n, k = 16) (one cell to the left), which 
appears almost identical to the c orresponding profile for 
the complete metric in Fig. 5(c) The summed apparent 
profile (not shown) is also very similar to the summed 
complete profile in Fig. 5(a) A major distinction is ob- 



served however when examining negative values for the 
apparent profiles: when measured in a directional orthog- 
onal to macroscopic glider motion, it can report negative 
as well as positive values (see Fig. 4(c)). Negative val- 



ues occurs where the source, still part of the domain, is 
misleading about the next state of the destination. 

As an example, consider the glider in Fig. 4(a) At 
the positions to the left of those marked "x" , we con- 
firm a strong positive value for the local apparent trans- 
fer entropy t(i,j = l,n+ X,k = 6) (2.65 bits), as per 
the complete metric. However, Fig. 4(c) displays large 
negative values of t(i, j = —1, n + 1, k = 6) (the orthog- 
onal channel to glider motion) at these same positions. 

There we compute p{xi^ n +\\x\ n 6 \xi_i jn ) = 0.038 and 

p{xi^ n +i\x^~^) = 0.16. The local apparent transfer en- 
tropy negative here because the probability of observing 
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the actual next state of the destination is much lower 
when the source on the right is taken into account than 
when it is not (i.e. the source is misleading). As such, 
Eq. (IT]) gives t(i,j = -l,n+ l,k = 6) = -2.05 bits at 
this point. Compare this to the complete metric for this 
channel, t c (i,j = —l,n + 1, k — 6), which measures 0.00 
bits here because the source at the right (still in the do- 
main) cannot add any information not contained in the 
other neighbor (which drives the glider). Note that the 
local apparent transfer entropy in the direction of glider 
motion was more informative than that in the orthogonal 
direction was misleading. Also, note that negative val- 
ues of the local metric are not found for the orthogonal 
direction at every point in the glider. 

Another distinction is observed for ECA rule 18. As ex- 
pected, the apparent metric identifies high positive trans- 
fer entropy in the direction of domain wall motion (see 
Fig. 6(d) for the j = 1 channel), and negative transfer en- 



tropy in the orthogonal direction to domain wall motion 
(not shown). However, the apparent metric finds van- 
ishing transfer entropy throughout the domain (for both 
channels j — —I and 1), in stark contrast to the periodic 
pattern found with the complete metric. At every second 
site with the "0" values, the state of the destination is 
completely predictable from its past, so we have t = bits 
as for t c . However, at the alternate sites both possible 
next states are equally likely in the context of the destina- 
tion's history and remain so when considering the source: 
as such we find t — 0. It is only when including the rest 
of the neighborhood in the context (with the complete 
metric) that one observes the source to be adding 1 bit 
of information. This example brings to mind discussion 
on the nature of information transfer in complex versus 
chaotic dynamics H [H, EH EBl and suggests that per- 
haps in chaotic dynamics, where many sources influence 
outcomes in a non-coherent manner, the complete met- 
ric may indicate large information transfer whereas the 
apparent metric does not (because other sources obscure 
the contribution of the source under consideration) . 

The apparent and complete metrics are clearly capa- 
ble of producing different insights under certain circum- 
stances, and both viewpoints are valuable. We are cur- 
rently investigating an application of the apparent trans- 
fer entropy in combination with a measure of information 
storage to identify information modification [3^ |. 



E. Averaged transfer entropies 

We compute the averaged transfer metrics as a func- 
tion of k for ECA rule 110 in Fig.[7]so as to check whether 
similar insights can be gained from this trend. In fact, 
only limited insights are gained here. The average com- 
plete transfer entropies decrease with k: an increase is 
impossible because we condition out more of the infor- 
mation that appears to come from the source. The aver- 
age apparent transfer entropy can show increases with k 
however; this is possible with a three-term entropy [29| 
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FIG. 7: Average transfer entropies versus conditioning length 
k, plotted for complete and apparent transfer entropies in 
channels j = 1 and —1 in ECA rule 110. 



where other information sources are not taken into ac- 
count. None of these reach a limiting value for the extent 
of k measured, suggesting again that k — > oo should be 
used. Realistically, k is limited (e.g. to k = 16 in previ- 
ous sections) by the sample size so as to retain a sufficient 
number of observations per configuration to reasonably 
estimate the probability distribution functions. 

The local metrics clearly reveal much about the infor- 
mation dynamics of a system that their averages do not. 
In particular, these averages tell us nothing of the pres- 
ence of glider particles, not to mention that they would 
be clearly highlighted once k > 6. Also, while the average 
apparent and complete metrics appear to be converging 
to a similar value in each channel, this belies their im- 
portant distinctions discussed earlier. 



V. CONCLUSION 

We have presented a local formulation of the transfer 
entropy in order to characterize the information transfer 
into each spatiotemporal point in a complex system. Lo- 
cal transfer entropy presents insights that cannot be ob- 
tained using the averaged measure alone, in particular in 
providing these spatiotemporal information transfer pro- 
files as an analytic tool. Importantly, the local transfer 
entropy allowed us to study the transfer entropy metric 
itself, including the importance of appropriate destina- 
tion conditioning lengths k (e.g. that using k — > oo is 
most correct), and to contrast the apparent and com- 
plete forms which were introduced here. 

On applying the local transfer entropy to cellular au- 
tomata, we demonstrated its utility as a valid filter for 
coherent structure. It is novel in comparison to other 
filtering methods previously presented for CAs. It pro- 
vides continuous rather than discrete values (like [28[ and 
[23|). It does not follow an arbitrary spatial preference 
(unlike [28| and (22J) but rather the flow of time only. 
As described for local statistical complexity in 23] , local 
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transfer entropy does not require a new filter for every 
CA, but the probability distribution functions must be 
recalculated for every CA. Perhaps most importantly, it 
provides multiple views of information transfer in each 
generic channel or direction (which no other filters do), 
and also provides a combined view which matches many 
important features highlighted by other filters. Finally, it 
highlights subtly different parts of emergent structure to 
other filters, i.e.: the leading glider edges facilitating the 
information transfer; only the minimal part of domain 
walls necessary to identify them; the particles are identi- 
fied as consisting of different points due to our temporal 
approach; and it does not highlight vertical gliders since 
they are not traveling information. 

Most significantly, local transfer entropy provided the 
first quantitative support for the long-held conjecture 
that particles (both gliders and domain walls) are the 
information transfer agents in CAs. This is particu- 
larly important because of analogies between particles in 
CAs and coherent structure or hypothesized information 
transfer agents in physical systems, such as traveling lo- 
calizations caused by dipole-dipole interactions in micro- 
tubules [1] and in soliton dynamics [13] ■ This formulation 
of local transfer entropy is ready to be applied beyond 
CAs to systems such as these (and including stochastic 
systems), where it may prove similar conjectures about 



information transfer therein. 

This result is important in bringing together the quan- 
titative definition of information transfer (transfer en- 
tropy) with the popular understanding of the concept 
through widely-accepted instances (such as particles in 
CAs). The result therefore completes the establishment 
of transfer entropy as the appropriate measure for (pre- 
dictive) information transfer in complex systems. A com- 
parison should be made with a localization of the "infor- 
mation flow" metric [34j in future work, in order to ex- 
plore the differences between its causal perspective and 
the predictive or computational perspective of transfer 
entropy. In doing so, the limitations of the transfer en- 
tropy metric must be considered. These include that the 
transfer entropy should consider only causal information 
contributors as the source and as other information con- 
tributors to be conditioned on (in the complete metric). 
Considering non-causal sources (e.g. outside the neigh- 
borhood in CAs) has the potential to mistake correlation 
for information transfer, and conditioning on non-causal 
elements could cause information that was actually part 
of the transfer to be disregarded. 

Finally, we are building on this investigation to de- 
scribe local measures of information storage and modifi- 
cation also in a complete local framework for information 
dynamics in complex systems (see f39|). 
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